Ensemble Classification for Relational Domains
نویسنده
چکیده
Ensemble classification methods have been shown to produce more accurate predictions than the base component models (Bauer and Kohavi 1999). Due to their effectiveness, ensemble approaches have been applied in a wide range of domains to improve classification. The expected prediction error of classification models can be decomposed into bias and variance (Friedman 1997). Ensemble methods that independently construct component models (e.g., bagging) can improve performance by reducing the error due to variance, while methods that dependently construct component models (e.g., boosting) can improve performance by reducing the error due to bias and variance. Although ensemble methods were initially developed for classification of independent and identically distributed (i.i.d.) data, they can be directly applied for relational data by using a relational classifier as the base component model. This straightforward approach can improve classification for network data, but suffers from a number of limitations. First, relational data characteristics will only be exploited by the base relational classifier, and not by the ensemble algorithm itself. We note that explicitly accounting for the structured nature of relational data by the ensemble mechanism can significantly improve ensemble classification. Second, ensemble learning methods that assume i.i.d. data can fail to preserve the relational structure of non-i.i.d. data, which will (1) prevent the relational base classifiers from exploiting these structures, and (2) fail to accurately capture properties of the dataset, which can lead to inaccurate models and classifications. Third, ensemble mechanisms that assume i.i.d. data are limited to reducing errors associated with i.i.d. models and fail to reduce additional sources of error associated with more powerful (e.g., collective classification (Sen et al. 2008)) models. Our key observation is that collective classification methods have error due to variance in inference (Neville and Jensen 2008). This has been overlooked by current ensemble methods that assume exact inference methods and only focus on the typical goal of reducing errors due to learning, even if the methods explicitly consider relational data (Preisach and Schmidt-Thieme 2006). Here we study the problem of ensemble classification for relational domains by focusing on the reduction of error due to variance. We propose a relational ensemble framework
منابع مشابه
An ensemble model for collective classification that reduces learning and inference variance
Ensemble learning can improve classification of relational data. Previous attempts to do so include methods that have focused primarily on reducing learning or inference variance, but not both at the same time. We present an ensemble model that reduces error due to variance in both learning and collective inference. Our model uniquely combines two strategies tailored specifically for relational...
متن کاملAcross-Model Collective Ensemble Classification
Ensemble classification methods that independently construct component models (e.g., bagging) improve accuracy over single models by reducing the error due to variance. Some work has been done to extend ensemble techniques for classification in relational domains by taking relational data characteristics or multiple link types into account during model construction. However, since these approac...
متن کاملRepresentations and Ensemble Methods for Dynamic Relational Classification
Temporal networks are ubiquitous and evolve over time by the addition, deletion, and changing of links, nodes, and attributes. Although many relational datasets contain temporal information, the majority of existing techniques in relational learning focus on static snapshots and ignore the temporal dynamics. We propose a framework for discovering temporal representations of relational data to i...
متن کاملOptimum Ensemble Classification for Fully Polarimetric SAR Data Using Global-Local Classification Approach
In this paper, a proposed ensemble classification for fully polarimetric synthetic aperture radar (PolSAR) data using a global-local classification approach is presented. In the first step, to perform the global classification, the training feature space is divided into a specified number of clusters. In the next step to carry out the local classification over each of these clusters, which cont...
متن کاملEnsemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کامل